Skip to content

GH-3451. Add a JMH benchmark for variants#3452

Open
steveloughran wants to merge 10 commits intoapache:masterfrom
steveloughran:pr/benchmark-variant
Open

GH-3451. Add a JMH benchmark for variants#3452
steveloughran wants to merge 10 commits intoapache:masterfrom
steveloughran:pr/benchmark-variant

Conversation

@steveloughran
Copy link
Copy Markdown
Contributor

Rationale for this change

There's no benchmark for variant IO and so there's no knowledge of any problems which exist now, or any way to detect regressions.

What changes are included in this PR?

  • adds parquet-variant to parquet-benchmark dependencies
  • new JMH benchmark VariantBenchmark

Are these changes tested?

Manually, initial PR doesn't fork the JVM for each option.

Benchmark                               (depth)  (fieldCount)  Mode  Cnt      Score      Error  Units
VariantBenchmark.benchmarkBuildVariant  Shallow          1000    ss  100   2007.783 ±  244.861  us/op
VariantBenchmark.benchmarkBuildVariant  Shallow         10000    ss  100  16214.358 ± 1048.142  us/op
VariantBenchmark.benchmarkBuildVariant   Nested          1000    ss  100   1544.472 ±   91.232  us/op
VariantBenchmark.benchmarkBuildVariant   Nested         10000    ss  100  15312.341 ±  226.414  us/op
VariantBenchmark.benchmarkDeserialize   Shallow          1000    ss  100    893.913 ±   36.284  us/op
VariantBenchmark.benchmarkDeserialize   Shallow         10000    ss  100   8499.729 ±  197.003  us/op
VariantBenchmark.benchmarkDeserialize    Nested          1000    ss  100    907.712 ±   80.187  us/op
VariantBenchmark.benchmarkDeserialize    Nested         10000    ss  100   8450.447 ±  163.247  us/op
VariantBenchmark.benchmarkSerialize     Shallow          1000    ss  100     16.095 ±   29.358  us/op
VariantBenchmark.benchmarkSerialize     Shallow         10000    ss  100      6.416 ±    7.178  us/op
VariantBenchmark.benchmarkSerialize      Nested          1000    ss  100      3.777 ±    0.528  us/op
VariantBenchmark.benchmarkSerialize      Nested         10000    ss  100      3.956 ±    0.536  us/op
VariantBenchmark.writeShredded          Shallow          1000    ss  100   1943.923 ±  103.121  us/op
VariantBenchmark.writeShredded          Shallow         10000    ss  100  20139.185 ±  341.913  us/op
VariantBenchmark.writeShredded           Nested          1000    ss  100   1920.326 ±   42.812  us/op
VariantBenchmark.writeShredded           Nested         10000    ss  100  20980.458 ±  539.303  us/op
VariantBenchmark.writeUnshredded        Shallow          1000    ss  100     29.876 ±   44.216  us/op
VariantBenchmark.writeUnshredded        Shallow         10000    ss  100     17.380 ±   39.148  us/op
VariantBenchmark.writeUnshredded         Nested          1000    ss  100      3.254 ±    1.061  us/op
VariantBenchmark.writeUnshredded         Nested         10000    ss  100     16.602 ±   33.320  us/op

there's 100 iterations per benchmark because some of the unshredded/small object operations are so fast that clock granuarity becomes an issue.

Are there any user-facing changes?

No

Closes #3451

@steveloughran steveloughran marked this pull request as draft March 19, 2026 10:45
@steveloughran
Copy link
Copy Markdown
Contributor Author

Still thinking of what else can be done here...suggestions welcome.

Probably a real write to the localfs and read back in

@steveloughran
Copy link
Copy Markdown
Contributor Author

I'll add a "deep" option too, for consistency with the iceberg pr

@steveloughran steveloughran marked this pull request as ready for review March 24, 2026 14:58
private static int count() {
int c = counter++;
if (c >= 512) {
c = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only resets the local copy, counter keeps growing?

Copy link
Copy Markdown
Contributor Author

@steveloughran steveloughran Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. will fix.

* deser to recurse down
* include uuid and bigdecimal
* reset counter on benchmark setup
iterations of class code and #of rows are the same
for easy compare of overheads.
Using the same structure as the iceberg tests do
@steveloughran
Copy link
Copy Markdown
Contributor Author

There's now a new benchmark which writes a file using the same simple schema as I'm doing in iceberg apache/iceberg#15629 , and tries to do a projection on it.

 SELECT id, category, variant_get('nested.varcategory') FROM table

Review by the copilot


Setup: 1M rows, 4-field nested variant (idstr, varid, varcategory, col4), querying varcategory only. SingleShotTime, 15 iterations, @fork(0).

Raw Results


  ┌───────────────────────────┬──────────┬───────────────┬─────────┬────────┐
  │ Benchmark                 │ shredded │ Score (ms/op) │ Error   │ µs/row │
  ├───────────────────────────┼──────────┼───────────────┼─────────┼────────┤
  │ readAllRecords            │ false    │ 728.514       │ ±11.253 │ 0.729  │
  ├───────────────────────────┼──────────┼───────────────┼─────────┼────────┤
  │ readProjectedFileSchema   │ false    │ 760.287       │ ±3.314  │ 0.760  │
  ├───────────────────────────┼──────────┼───────────────┼─────────┼────────┤
  │ readProjectedLeanSchema   │ false    │ 1405.264      │ ±8.399  │ 1.405  │
  ├───────────────────────────┼──────────┼───────────────┼─────────┼────────┤
  │ readAllRecords            │ true     │ 1315.615      │ ±14.598 │ 1.316  │
  ├───────────────────────────┼──────────┼───────────────┼─────────┼────────┤
  │ readProjectedFileSchema   │ true     │ 1297.870      │ ±19.621 │ 1.298  │
  ├───────────────────────────┼──────────┼───────────────┼─────────┼────────┤
  │ readProjectedLeanSchema   │ true     │ 725.618       │ ±10.574 │ 0.726  │
  └───────────────────────────┴──────────┴───────────────┴─────────┴────────┘

Speedup/Penalty vs readAllRecords Baseline


  ┌───────────────────────────┬──────────────────┬──────────────────┐
  │ Benchmark                 │ shredded=false   │ shredded=true    │
  ├───────────────────────────┼──────────────────┼──────────────────┤
  │ readProjectedFileSchema   │ −4% (overhead)   │ +1% (noise)      │
  ├───────────────────────────┼──────────────────┼──────────────────┤
  │ readProjectedLeanSchema   │ −93% penalty     │ +45% speedup     │
  └───────────────────────────┴──────────────────┴──────────────────┘
  • Lean schema projection is the only technique that skips columns. Projecting the full file schema (readProjectedFileSchema) gives zero benefit in either case — Parquet still reads all column chunks.
  • Lean schema + shredded = 45% faster than reading all columns. Skipping idstr, varid, and col4 typed columns saves ~590ms per 1M rows.
  • Lean schema + unshredded = 93% slower. The lean schema requests typed_value.varcategory which does not exist in the unshredded file. Parquet handles the missing columns at every row, which is more expensive than
    reading the single binary blob directly.
  • Schema detection in ReadSupport.init() is essential. Applying containsField("typed_value") to choose between lean and full schema prevents the unshredded penalty while preserving the shredded speedup.

Recommendation

Always detect file layout in ReadSupport.init() and apply the lean projection only when the file was written with a shredded schema. For unshredded files, use the full file schema or no projection.

If you have a query with a pushdown predicate that wants to look inside a variant, creating a MessageType schema referring to the shredded values is counterproductive unless you know that the variant is shedded.

That can be determined by looking at the schema and use `.containsField("typed_value") to see if the target variant has any nested values.

    @Override
    public ReadContext init(InitContext context) {
      MessageType fileSchema = context.getFileSchema();
      GroupType nested = fileSchema.getType("nested").asGroupType();
      if (nested.containsField("typed_value")) {
        return new ReadContext(VARCATEGORY_PROJECTION);
      }
      // Unshredded file: projection designed for typed columns provides no benefit and
      // causes schema mismatch overhead — fall back to the full file schema.
      return new ReadContext(fileSchema);
    }

@steveloughran
Copy link
Copy Markdown
Contributor Author

build failures are all because java11 javadoc is extra-fussy than the versions either side of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a JMH benchmark for variants

2 participants